pronunciation dictionary
The Development of a Comprehensive Spanish Dictionary for Phonetic and Lexical Tagging in Socio-phonetic Research (ESPADA)
Pronunciation dictionaries are an important component in the process of speech forced alignment. The accuracy of these dictionaries has a strong effect on the aligned speech data since they help the mapping between orthographic transcriptions and acoustic signals. In this paper, I present the creation of a comprehensive pronunciation dictionary in Spanish (ESPADA) that can be used in most of the dialect variants of Spanish data. Current dictionaries focus on specific regional variants, but with the flexible nature of our tool, it can be readily applied to capture the most common phonetic differences across major dialectal variants. We propose improvements to current pronunciation dictionaries as well as mapping other relevant annotations such as morphological and lexical information. In terms of size, it is currently the most complete dictionary with more than 628,000 entries, representing words from 16 countries. All entries come with their corresponding pronunciations, morphological and lexical tagging, and other relevant information for phonetic analysis: stress patterns, phonotactics, IPA transcriptions, and more. This aims to equip socio-phonetic researchers with a complete open-source tool that enhances dialectal research within socio-phonetic frameworks in the Spanish language.
- Europe > Austria > Vienna (0.14)
- North America > Mexico > Campeche (0.04)
- South America > Venezuela (0.04)
- (21 more...)
Large Vocabulary Spontaneous Speech Recognition for Tigrigna
Kahsu, Ataklti, Teferra, Solomon
This thesis proposes and describes a research attempt at designing and developing a speaker independent spontaneous automatic speech recognition system for Tigrigna The acoustic model of the Speech Recognition System is developed using Carnegie Mellon University Automatic Speech Recognition development tool (Sphinx) while the SRIM tool is used for the development of the language model. Keywords Automatic Speech Recognition Tigrigna language
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)
- North America > United States > New Jersey (0.04)
My Science Tutor (MyST) -- A Large Corpus of Children's Conversational Speech
Pradhan, Sameer S., Cole, Ronald A., Ward, Wayne H.
This article describes the MyST corpus developed as part of the My Science Tutor project -- one of the largest collections of children's conversational speech comprising approximately 400 hours, spanning some 230K utterances across about 10.5K virtual tutor sessions by around 1.3K third, fourth and fifth grade students. 100K of all utterances have been transcribed thus far. The corpus is freely available (https://myst.cemantix.org) for non-commercial use using a creative commons license. It is also available for commercial use (https://boulderlearning.com/resources/myst-corpus/). To date, ten organizations have licensed the corpus for commercial use, and approximately 40 university and other not-for-profit research groups have downloaded the corpus. It is our hope that the corpus can be used to improve automatic speech recognition algorithms, build and evaluate conversational AI agents for education, and together help accelerate development of multimodal applications to improve children's excitement and learning about science, and help them learn remotely.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > Colorado > Boulder County > Boulder (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
Ribeiro, Manuel Sam, Comini, Giulia, Lorenzo-Trueba, Jaime
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings. Our approach bootstraps a G2P with a small set of annotated examples. The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation. Given hypothesized phoneme labels, we learn pronunciation dictionaries for out-of-vocabulary words, and we use those to re-train the G2P system. Results indicate that our approach consistently improves the phone error rate of G2P systems across languages and amount of available data.
Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
Do, Phat, Coler, Matt, Dijkstra, Jelske, Klabbers, Esther
We compare using a PHOIBLE-based phone mapping method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to test the language-independence of the methods and enhance the findings' applicability. We use Character Error Rates from automatic speech recognition and predicted Mean Opinion Scores for evaluation. Results show that both phone mapping and features input improve the output quality and the latter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) with a family tree-based distance measure as a criterion to select source languages in transfer learning. ASPF proves effective if label-based phone input is used, while the language distance does not have expected effects.
- Europe > Netherlands (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech
Do, Phat, Coler, Matt, Dijkstra, Jelske, Klabbers, Esther
We compare phone labels and articulatory features as input for cross-lingual transfer learning in text-to-speech (TTS) for low-resource languages (LRLs). Experiments with FastSpeech 2 and the LRL West Frisian show that using articulatory features outperformed using phone labels in both intelligibility and naturalness. For LRLs without pronunciation dictionaries, we propose two novel approaches: a) using a massively multilingual model to convert grapheme-to-phone (G2P) in both training and synthesizing, and b) using a universal phone recognizer to create a makeshift dictionary. Results show that the G2P approach performs largely on par with using a ground-truth dictionary and the phone recognition approach, while performing generally worse, remains a viable option for LRLs less suitable for the G2P approach. Within each approach, using articulatory features as input outperforms using phone labels.
- Europe > Netherlands (0.05)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.63)
- Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)
Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
Wang, Lei, Tong, Rong, Leung, Cheung Chi, Sivadas, Sunil, Ni, Chongjia, Ma, Bin
This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.
- Asia > Singapore (0.05)
- Oceania > Australia > Queensland > Brisbane (0.05)
- Asia > Myanmar (0.05)
- (14 more...)
Multi-Module G2P Converter for Persian Focusing on Relations between Words
Rezaei, Mahdi, Nayeri, Negar, Farzi, Saeed, Sameti, Hossein
G2P systems aim to convert a grapheme (letter) sequence into its In this paper, we investigate the application of pronunciation sequence, and are an essential end-to-end and multi-module frameworks for component of text-to-speech (TTS) and speech G2P conversion for the Persian language. The recognition systems for any language lacking results demonstrate that our proposed multimodule consistent pronunciation rules. G2P system outperforms our end-to-end A good G2P system must address the issues of systems in terms of accuracy and speed. The out-of-vocabulary (OOV) words and cross-word system consists of a pronunciation dictionary as relations. OOV words are those which are not our look-up table, along with separate models to present in the lexicon, meaning they were not handle homographs, OOVs and ezafe in Persian seen during model training. In the case of G2P, created using GRU and Transformer the lexicon is a dictionary consisting of architectures. The system is sequence-level rather graphemes and their respective phonemes. As for than word-level, which allows it to effectively cross-word relations, a Persian G2P task is capture the unwritten relations between words mainly concerned with homographs and ezafe (cross-word information) necessary for constructions.
Researchers teach a computer to compose sonnets like Shakespeare
In addition to penning 37 plays, William Shakespeare was a prolific composer of sonnets -- crafting 154 of them during his life. Now, more than 400 years after his death, the Bard's words are influencing a new generation of poets. It's just that these writers do so with silicon imaginations and digital quills. A consortium of researchers from the University of Toronto, the University of Melbourne and IBM's Australia division have managed to teach a neural network to craft sonnets just as the Bard did in the 16th century, using his own words to teach the machine. They published their results at the 2018 ACL conference, and you can play around with the network itself over at GitHub.